Object Detection

Description

The object detection tool is a powerful feature that enables the user to identify and locate various objects within images. It utilises a pre-trained model, which has been trained on a vast amount of data to localise and classify different objects.

By applying this tool to an image, the user can automatically localise and identify objects present in the image. The object detection tool offers a convenient and efficient way to perform object recognition tasks without the need for manual intervention or extensive personnel training.

Settings

Model Folder Path

The model folder path is the path to the folder containing the model files. The model files are the files that are used to detect objects in the images. The model folder path can be a local path or a remote path.

See Model Folder Structure for more information about how to structure the model folder.

info

Supported Architectures

Here are the supported frameworks and architectures:

TensorFlow 1

TensorFlow 1 architectures

Architecture	Backbone	Input Size	Model File Type
Faster R-CNN	Inception V2	1000x1000	`.pb` & `.pbtxt`
Faster R-CNN	ResNet50	1000x1000	`.pb` & `.pbtxt`
SSD	Inception V2	300x300	`.pb` & `.pbtxt`
SSD	MobileNet V2	300x300	`.pb` & `.pbtxt`
SSDLite	MobileNet V2	300x300	`.pb` & `.pbtxt`

DarkNet

DarkNet architectures

Architecture	Backbone	Input Size	Model File Type
YOLO Tiny V3	DarkNet	416x416	`.weights` & `.cfg`
YOLO V3	DarkNet	416x416	`.weights` & `.cfg`
YOLO V4	CSPDarkNet	416x416	`.weights` & `.cfg`

TensorFlow 2 - ONNX

TensorFlow 2 architectures

Architecture	Backbone	Input Size	Model File Type
CenterNet	Hourglass104	512x512	`.onnx`
CenterNet	Hourglass104	1024x1024	`.onnx`
EfficientDet-D0	EfficientNet-B0	512x512	`.onnx`
EfficientDet-D1	EfficientNet-B1	640x640	`.onnx`
EfficientDet-D2	EfficientNet-B2	768x768	`.onnx`
EfficientDet-D3	EfficientNet-B3	896x896	`.onnx`
EfficientDet-D4	EfficientNet-B4	1024x1024	`.onnx`
EfficientDet-D5	EfficientNet-B5	1280x1280	`.onnx`
EfficientDet-D6	EfficientNet-B6	1280x1280	`.onnx`
EfficientDet-D7	EfficientNet-B7	1536x1536	`.onnx`
SSD	MobileNet V2	320x320	`.onnx`
SSD	MobileNet V2	640x640	`.onnx`
SSD	MobileNet V2	1024x1024	`.onnx`
Faster R-CNN	ResNet50	640x640	`.onnx`
Faster R-CNN	ResNet50	1024x1024	`.onnx`
Faster R-CNN	ResNet50	1333x800	`.onnx`

MMYolo

MMYolo architectures

Architecture	Backbone	Input Size	Model File Type
YOLOv5-n	CSPDarkNet (P5)	640x640	`.onnx`
YOLOv5-s	CSPDarkNet (P5)	640x640	`.onnx`
YOLOv5-m	CSPDarkNet (P5)	640x640	`.onnx`
YOLOv5-l	CSPDarkNet (P5)	640x640	`.onnx`
YOLOv5-x	CSPDarkNet (P5)	640x640	`.onnx`
YOLOv5-n	CSPDarkNet (P6)	1280x1280	`.onnx`
YOLOv5-s	CSPDarkNet (P6)	1280x1280	`.onnx`
YOLOv5-m	CSPDarkNet (P6)	1280x1280	`.onnx`
YOLOv5-l	CSPDarkNet (P6)	1280x1280	`.onnx`
YOLOv6-n	EfficientRep (P5)	640x640	`.onnx`
YOLOv6-t	EfficientRep (P5)	640x640	`.onnx`
YOLOv6-s	EfficientRep (P5)	640x640	`.onnx`
YOLOv6-m	EfficientRep (P5)	640x640	`.onnx`
YOLOv6-l	EfficientRep (P5)	640x640	`.onnx`
YOLOv7-tiny	CSPDarknet53 (P5)	640x640	`.onnx`
YOLOv7-l	CSPDarknet53 (P5)	640x640	`.onnx`
YOLOv7-x	CSPDarknet53 (P5)	640x640	`.onnx`
YOLOv7-w	CSPDarknet53 (P6)	1280x1280	`.onnx`
YOLOv7-d	CSPDarknet53 (P6)	1280x1280	`.onnx`
YOLOv8-n	CSPDarknet53+PANet (P5)	640x640	`.onnx`
YOLOv8-s	CSPDarknet53+PANet (P5)	640x640	`.onnx`
YOLOv8-m	CSPDarknet53+PANet (P5)	640x640	`.onnx`
YOLOv8-l	CSPDarknet53+PANet (P5)	640x640	`.onnx`
YOLOv8-x	CSPDarknet53+PANet (P5)	640x640	`.onnx`
YOLOX-t	Swin Transformer	640x640	`.onnx`
YOLOX-s	Swin Transformer	640x640	`.onnx`
YOLOX-m	Swin Transformer	640x640	`.onnx`
YOLOX-l	Swin Transformer	640x640	`.onnx`
YOLOX-x	Swin Transformer	640x640	`.onnx`
PPYOLOE+-s	PVTv2	640x640	`.onnx`
PPYOLOE+-m	PVTv2	640x640	`.onnx`
PPYOLOE+-l	PVTv2	640x640	`.onnx`
PPYOLOE+-x	PVTv2	640x640	`.onnx`
RTMDet-tiny	ResNet-FPN	640x640	`.onnx`
RTMDet-s	ResNet-FPN	640x640	`.onnx`
RTMDet-m	ResNet-FPN	640x640	`.onnx`
RTMDet-l	ResNet-FPN	640x640	`.onnx`
RTMDet-x	ResNet-FPN	640x640	`.onnx`

MMDetection

MMDetection architectures

Architecture	Backbone	Input Size	Model File Type
RTMDet-tiny	ResNet-FPN	640x640	`.onnx`
RTMDet-s	ResNet-FPN	640x640	`.onnx`
RTMDet-m	ResNet-FPN	640x640	`.onnx`
RTMDet-l	ResNet-FPN	640x640	`.onnx`
RTMDet-x	ResNet-FPN	640x640	`.onnx`
YOLOX-t	Swin Transformer	640x640	`.onnx`
YOLOX-s	Swin Transformer	640x640	`.onnx`
YOLOX-m	Swin Transformer	640x640	`.onnx`
YOLOX-l	Swin Transformer	640x640	`.onnx`
YOLOX-x	Swin Transformer	640x640	`.onnx`
CenterNet	ResNet-18	1024x1024	`.onnx`
CenterNet	ResNet-50	1024x1024	`.onnx`
CenterNet	ResNet-101	1024x1024	`.onnx`
CenterNet	ResNet-50	1088x800	`.onnx`
Faster R-CNN	R-50-C4 (Caffe)	1088x800	`.onnx`
Faster R-CNN	R-50-DC5 (Caffe)	1088x800	`.onnx`
Faster R-CNN	R-50-FPN (Caffe)	1088x800	`.onnx`
Faster R-CNN	R-50-FPN (PyTorch)	1088x800	`.onnx`
Faster R-CNN	R-101-FPN (Caffe)	1088x800	`.onnx`
Faster R-CNN	R-101-FPN (PyTorch)	1088x800	`.onnx`
Faster R-CNN	X-101-32x4d-FPN (PyTorch)	1088x800	`.onnx`
Faster R-CNN	X-101-64x4d-FPN (PyTorch)	1088x800	`.onnx`
Swin (Mask R-CNN)	Swin-T	1088x800	`.onnx`
Cascade R-CNN	R-50-FPN (Caffe)	1088x800	`.onnx`
Cascade R-CNN	R-50-FPN (PyTorch)	1088x800	`.onnx`
Cascade R-CNN	R-101-FPN (Caffe)	1088x800	`.onnx`
Cascade R-CNN	R-101-FPN (PyTorch)	1088x800	`.onnx`
Cascade R-CNN	X-101-32x4d-FPN (PyTorch)	1088x800	`.onnx`
Cascade R-CNN	X-101-64x4d-FPN (PyTorch)	1088x800	`.onnx`

Others

Others architectures

Architecture	Backbone	Input Size	Model File Type	link
YOLO-World	DarkNet	640x640	`.onnx`	link

tip

Zene UI

The user can choose the model folder path by clicking on the folder icon and selecting the folder containing the model files (see File Explorer for more information).

In this example, the model folder is located at D:/ontf2_faster_rcnn_resnet50_v1_640x640. Below the model path selector the user can see the Model Name (folder name), and the model Framework.

Architecture

The architecture is the type of model that will be used to detect objects in the images. See Supported Architecture for more information.

NMS Threshold

The NMS threshold is the minimum threshold for non-maximum suppression. Non-maximum suppression is a technique used to reduce the number of bounding boxes by removing the ones that overlap too much with other bounding boxes. A lower NMS threshold will result in fewer bounding boxes, but with potentially higher accuracy, whereas a higher threshold will result in more bounding boxes, but with lower accuracy.

Confidence Threshold

The confidence threshold is the minimum confidence score a detection should have to be considered valid. Detections with confidence scores below this threshold will be discarded. A higher confidence threshold will result in fewer detections, but with higher accuracy, whereas a lower threshold will result in more detections, but with potentially lower accuracy.

Split Frame

Whether to split the frame into multiple frames before detecting objects. This is useful when the image is too large to be processed by the model.

Transform Contains

A list of configurations for transforming a detected primary object into another object based on whether they are other objects within the primary object. For example, if the user wants to transform the detected person object into a group object when there are multiple person objects within the primary person object, the user can add a configuration to the transform contains list.

See Detected Object Transformation Contain for more information.

Transform Labels

A list of class labels' original names and their transformed names. All the labels are inherited from after transforming contained objects.

Keep Labels

The keep labels are the labels that will be kept when detecting objects. For example, if the user want to keep only the person label, the user can add only person to the keep labels list.

Ignore Labels

The ignored labels are the labels that will be ignored when detecting objects. For example, if the user want to ignore the person label, the user can add person to the ignored labels list.

Advanced Settings

Whether to enabled advanced settings. The advanced settings allow the user to have more control over the model parameters.

caution

Please use the advanced settings with caution as changing the parameters could result in unexpected behaviour.

To RGB

Advanced Settings

Convert from BGR colours (default in Zene) to RGB colours.This is useful when using models trained on images with different channel orders.

Scale

Advanced Settings

The scale is the scale factor that is applied to the input image. The scale factor is used to normalise the input image before it is passed to the model. The scale factor is calculated by dividing the input image by the scale factor. A higher scale factor will result in a smaller input image, while a lower scale factor will result in a larger input image.

Mean

Advanced Settings

The mean is the mean value that is subtracted from the input image. The mean value is used to normalise the input image before it is passed to the model. The mean value is calculated by subtracting the input image by the mean value. A higher mean value will result in a darker input image, while a lower mean value will result in a brighter input image.

Input Pixel (width and height)

Advanced Settings

The input pixel is the size of the input image. The input image is the image that is passed to the model. The user can specify the width and height of the input image.

Display Results

Overlay Results

Whether to draw the results on top of the image frame.

Description​

Settings​

Model Folder Path​

Supported Architectures​

TensorFlow 1​

DarkNet​

TensorFlow 2 - ONNX​

MMYolo​

MMDetection​

Others​

Zene UI​

Architecture​

NMS Threshold​

Confidence Threshold​

Split Frame​

Transform Contains​

Transform Labels​

Keep Labels​

Ignore Labels​

Advanced Settings​

To RGB​

Scale​

Mean​

Input Pixel (width and height)​

Display Results​

Overlay Results​

Description

Settings

Model Folder Path

Supported Architectures

TensorFlow 1

DarkNet

TensorFlow 2 - ONNX

MMYolo

MMDetection

Others

Zene UI

Architecture

NMS Threshold

Confidence Threshold

Split Frame

Transform Contains

Transform Labels

Keep Labels

Ignore Labels

Advanced Settings

To RGB

Scale

Mean

Input Pixel (width and height)

Display Results

Overlay Results